Reconstruction of obscured views of captured imagery using arbitrary captured inputs

ABSTRACT

An imagery processing system obtains capture inputs from capture devices that might have capture parameters and characteristics that differ from those of a main imagery capture device. By normalizing outputs of those capture devices, potentially arbitrary capture devices could be used for reconstructing portions of a scene captured by the main imagery capture device when reconstructing a plate of the scene to replace an object in the scene with what the object obscured in the scene. Reconstruction could be of one main image, a stereo pair of images, or some number, N, of images where N&gt;2.

CROSS REFERENCES TO RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional PatentApplication Ser. No. 62/983,528, entitled RECONSTRUCTION OF OBSCUREDVIEWS OF CAPTURED IMAGERY USING ARBITRARY CAPTURED INPUTS, filed on Feb.28, 2020, which is hereby incorporated by reference as if set forth infull in this application for all purposes.

This application is related to the following applications which arehereby incorporated by reference as if set forth in full in thisapplication for all purposes:

U.S. patent application Ser. No. ______, entitled IMAGE PROCESSING FORREDUCING ARTIFACTS CAUSED BY REMOVAL OF SCENE ELEMENTS FROM IMAGES(WD0005US1), filed on ______;

U.S. patent application Ser. No. ______, entitled COMPUTER-GENERATEDIMAGE PROCESSING INCLUDING VOLUMETRIC SCENE RECONSTRUCTION (WD0008US1),filed on ______; and

U.S. patent application Ser. No. ______, entitled RECONSTRUCTION OFOBSCURED VIEWS IN CAPTURED IMAGERY USING PIXEL REPLACEMENT FROMSECONDARY IMAGERY (WD0009US1), filed on ______.

FIELD OF THE INVENTION

The present disclosure generally relates to digital image manipulation.The disclosure relates more particularly to apparatus and techniques forreconstructing portions of images obscured in a main image capture withinputs provided by other scene capture data.

BACKGROUND

In modern digital imagery creation (still images, video sequences offrames of images), there is often a desire to change from what iscaptured by a camera to convey something different. This might be thecase where a camera captures a scene in which two actors are acting andlater a content creator determines that the presence of one of theactors is to be removed from the captured video to result in a videosequence where the removed actor is not present and instead the videosequence shows what was behind the removed actor, a computer-generatedcharacter or object takes the place of the removed actor, or for otherreasons.

Viewer expectations are that artifacts of the removal from a capturedvideo sequence not be readily apparent. Simply removing the pixelscorresponding to the removed character would leave a blank spot in thevideo. Simply replacing those pixels with a generic background wouldleave artifacts at the boundary between pixels that were part of theremoved character and pixels nearby. With sufficient time, effort andcomputing power, an artist might manually “paint” the pixels in eachframe of the video where the removed character was, but that can be timeconsuming and tedious to get it to where viewers do not perceive anartifact of the removal.

Tools for more simply performing manipulation of imagery data would beuseful.

SUMMARY

An imagery processing system obtains capture inputs from capture devicesthat might have capture parameters and characteristics that differ fromthose of a main imagery capture device. By normalizing outputs of thosecapture devices, potentially arbitrary capture devices could be used forreconstructing portions of a scene captured by the main imagery capturedevice when reconstructing a plate of the scene to replace an object inthe scene with what the object obscured in the scene. Reconstructioncould be of one main image, a stereo pair of images, or some number, N,of images where N>2.

The following detailed description together with the accompanyingdrawings will provide a better understanding of the nature andadvantages of the present invention.

BRIEF DESCRIPTION OF THE DRAWINGS

Various embodiments in accordance with the present disclosure will bedescribed with reference to the drawings, in which:

FIG. 1 illustrates an environment in which imagery and data about ascene might be captured, from a top view, according to variousembodiments.

FIG. 2 illustrates a stage, from a top view, in which a scene iscaptured and has several possible plates of the scene that might be usedin generating reconstructed imagery of what would be visible, accordingto various embodiments.

FIG. 3 is a side view of a scene that might include occlusions to bereconstructed, according to various embodiments.

FIG. 4 is a block diagram of a system for creating reconstructed imageryfrom captured imagery of a scene and arbitrary inputs captured from thescene, according to various embodiments.

FIG. 5 is a flowchart of a process for reconstructing imagery fromcaptured imagery of a scene and arbitrary inputs captured from thescene, according to various embodiments.

FIG. 6 illustrates an example visual content generation system as mightbe used to generate imagery in the form of still images and/or videosequences of images, according to various embodiments.

FIG. 7 is a block diagram illustrating an example computer system uponwhich computer systems of the systems illustrated in FIGS. 1 and 6 maybe implemented.

DETAILED DESCRIPTION

In the following description, various embodiments will be described. Forpurposes of explanation, specific configurations and details are setforth in order to provide a thorough understanding of the embodiments.However, it will also be apparent to one skilled in the art that theembodiments may be practiced without the specific details. Furthermore,well-known features may be omitted or simplified in order not to obscurethe embodiment being described.

Techniques described and suggested herein include generating modifiedvideo from captured video of a scene and additional inputs related tothe scene, where the modified video is digitally modified to replace allor portions of objects in the scene recorded in the captured video(i.e., “original video”). It should be understood that examplesdescribed with reference to video sequences can apply to single or stillimages, unless otherwise indicated. A scene might comprise variousobjects and actors appearing in the scene, possibly moving, possiblybeing subject to lighting changes and/or camera movements. Herein, wherean object is described as including an object that is visible in thescene or not visible in the scene, the teaching might also apply tohuman and/or non-human actors. Thus, a step in a process that capturesan image of a scene and then processes the digitally captured video toremove an actor from the scene and reconstruct what was supposed to bebehind that actor might also be used for removing inanimate or non-actorobjects from the scene.

Rationales for modifying a video post-capture can vary and many of thetechniques described herein work well regardless of the rationale. Onerationale is that a scene is to be captured with three actorsinteracting where one of the actors is outfitted with motion capture(“mo-cap”) fiducials (contrasting markers, paint, etc.) and the modifiedvideo will have a computer-generated character moving in the scene inplace of the mo-cap actor, such as where the computer-generatedcharacter is a non-human character. Another rationale might be that avideo of a scene is captured and in post-production, a director changesa plot and that change requires that some character or object not bepresent even though it is present in the original captured imagery. Yetanother rationale is the discovery of filming errors that need to becorrected and a scene cannot be easily reshot.

In a more general case, one or more images are recreated, which could befrom a reconstructed plate or a set of normalized, aligned planes fromwhich an image can be reconstructed with objects filled in or removed.

FIG. 1 illustrates an environment in which imagery and data about ascene might be captured, from a top view, according to variousembodiments. FIG. 1 is an approximately top view of a stage 102 on whichthere are actors 104 and 106 and other objects 108, 110, and 112. Actionand the scene might be captured by camera 120, which might be movable ona track 122. A background wall 124 might provide content of the scenethat is captured by camera 120, and a green screen 126 might also bepresent and visible in the scene. As is known, green screens can beadded to scenes to facilitate the insertion of content into a framewhere that content does not exist in the scene, but is addedpost-capture of the scene. Camera 120 might be a main camera, sometimesreferred to as a “hero” camera, that is expected to capture the bulk ofthe scene. In some variations, multiple hero cameras are used to allowfor cutting from one view of the scene to another quickly.

In the digital video captured by camera 120 (or later digitized videoderived from analog filming of the scene), for the indicated position ofcamera 120 on track 122, actor 106 would be partially obscured in thevideo by actor 104 and object 110, while background wall 124 ispartially obscured by object 112. To provide a director an option tocast the scene without actor 104 or object 112, the director couldrequest that the entire scene be shot a second time without actor 104and object 112, but often such decisions are not made until after thescene is shot and the actors, objects or environment may no longer beavailable. Artists could manually paint frames to remove an object, butthat can be time consuming to get right.

To provide information for an automated plate reconstruction, additionaldevices might be deployed on or about stage 102 to gather data that canbe used for reconstruction. For example, witness cameras 130, 132 mightbe deployed to capture black and white, high resolution, low resolution,infrared or other particular wavelengths and resolutions of what ishappening in the scene. A Lidar or similar device 140 might also bedeployed to capture point clouds of distances to objects.

Herein, a plate might be a planar surface (which might or might notrelate to a physical surface) that intersects a view space of a camera.In FIG. 1, plates 150, 152 cross the view of camera 120. A plate 154intersects a view from witness camera 132. Although, in this example,the plates in FIG. 1 are shown perpendicular to a central axis of a viewfrustum of camera 120, that need not be the case in other applicationsas plates can have other desired orientations. In some embodiments, aplate can have depth and can define a volume instead of, or in additionto, one or more planar surfaces. In general, operations and propertiesdescribed herein for two-dimensional images may be applicable tothree-dimensional volumes. For example, capturing, manipulating,rendering or otherwise processing two dimensional items, such as images,frames, pixels, etc.; can apply to three-dimensional items such asmodels, settings, voxels, etc. unless otherwise indicated.

It may be that a director or artist desires to use computerized imageryediting tools to edit captured video from camera 120 such that the plateof interest is plate 106. In that case, editing might involve not onlyremoving pixels from frames that correspond to actor 104, but alsofilling in pixel color values for those pixels with what would have beencaptured by camera 120 for those pixels but for the obscuring effects ofthe opacity of actor 104 and object 110.

FIG. 2 illustrates a stage 202, from a top view, in which a scene iscaptured and has several possible plates 204(1)-(4) of the scene thatmight be used in generating reconstructed imagery of what would bevisible and that uses various cameras. As illustrated, cameras206(1)-(3) might be identically configured cameras, while camera 208 isconfigured differently. Such an arrangement, unless existing for otherreasons, might make reconstruction impractical, whereas an arrangementof FIG. 1 might not add complexity if the various different capturedevices are already in place for other reasons. In FIG. 2, camera 208might be placed and optimized for motion capture of action on the stage,such as where one or more of objects 212(1)-(5) present on stage 202 isoutfitted for motion capture. It can be efficient if inputs from camera208 could be used for plate reconstruction, but quite often theinformation gathered, sensitivity, position, lighting, etc. areuncoordinated with those elements of cameras 206(1)-206(3).

FIG. 3 is a side view of a scene that might include occlusions to bereconstructed. In a captured scene 302, a person 304 is between house306 and a camera that captured the image. A plate reconstruction processmight be used to generate, from a video sequence that includes person304 walking in front of house 306, a reconstructed video of a plate thatis behind person 304 so that, for example, the reconstructed video woulddisplay a window 308 on house 306 unobscured by person 304 despite thatthe main camera did not capture all of the pixels that would make up aview of window 308.

FIG. 4 is a block diagram of a system 400 for creating reconstructedimagery from captured imagery of a scene and arbitrary inputs capturedfrom the scene. An advantage of allowing for arbitrary types of input isthat preexisting devices or devices added for other purposes can be usedfor reconstruction. In part, system 400 can be used for reconstructingimagery for captured scenes when editing is done to remove objects fromthe scene that were present when captured. As illustrated, main cameravideo 402 is stored into main scene capture storage 404. Arbitraryinputs 406 can be obtained from other capture devices (mo-cap cameras,contrast cameras, stereo capture devices, Lidar, light sensors,environmental sensors, etc.). A preprocessor 410 obtains referenceinputs 412, reference stage parameters 414, and capture devicepositions/settings 416 and processes those to generate normalizingparameters that can be stored in normalizing parameter storage 420.

Reference inputs 412 might include capture device readings obtained of astage in the absence of objects. For example, a Lidar sensor might takereadings of a stage to be able to determine distances to fixedbackgrounds and the like, while an optical density capture device mightmeasure a quiescent optical density in the absence of activity.Reference stage parameters 414 might include measurements made of thestage itself, such as its lighting independent of a capture device,which capture device positions/settings 416 might include calibrationsettings and positions of capture devices relative to a stage. It shouldbe understood that the stage need not be a physical stage, but might besome other environment within which a scene to be captured can occur.For example, where a scene is to be shot of actors in battle outdoors,the stage might be an open field and the cameras and sensing devicesmight be placed relative to that open field to capture the visual actionand capture device inputs.

Normalizing parameters are provided to a normalizer 430 that can processthe arbitrary inputs 406 to generate normalized inputs, which can bestored in a normalized capture data storage 432. The normalized inputsmight be such that they can be used to fill in portions of a stage in ascene that was captured with a main camera that are portions notcaptured in the main camera imagery due to being obscured by objectsthat are to be removed from the captured imagery. But one example ofnormalization would be to modify inputs from another image capturedevice that was capturing light from the scene while the main camera wascapturing the main action, but where lighting, colors, and other factorswould result in the other image capture device capturing pixel colorvalues that are not matched with what would have been captured by themain camera for the plate but for the obscuring objects.

Reconstructing a plate from the main camera capture and normalizedinputs from other capture devices might not be straightforward. In suchcases, a machine-learning reconstructor 440 might take as inputsreconstruction parameters 442, reconstruction input selection 444,normalized capture data from storage 432, and main scene imagery fromstorage 404. Machine-learning reconstructor 440 might be trained onvideo with known values for what should be reconstructed. Once trained,machine-learning reconstructor 440 can output, from those inputs,reconstructed imagery 450. In an embodiment, reconstructed (i.e.,modified) imagery 450 corresponds to the main camera video 402, butwhere portions of a scene that were obstructed by objects to be removedare reconstructed so as to appear as if those removed objects were notpresent in the scene when it was captured.

FIG. 5 is a flowchart of a process for reconstructing imagery fromcaptured imagery of a scene and arbitrary inputs captured from thescene. The process might be used for plate reconstruction from inputsthat are not necessarily tied to the details of a camera that iscapturing a main view of the scene. The process might be performed by animage processing system or as part of a larger studio content creationsystem that might comprise a stage, props, cameras, objects on scene,computer processors, storage, and artist and other user interfaces forworking with content that is captured within the studio content creationsystem. In examples below, the process will be described with referenceto an imagery creation system capable of capturing images and/or videoand modifying the resulting captured imagery, with or without human userinput.

In a first step, step 502, the imagery creation system specifiesreference stage parameters for capture devices. These parameters mightrelate to what capture devices are being used, where they are located,etc. These parameters might be provided by users based on experience orby the imagery creation system performing computations to determine whatparameters might be needed.

In step 504, the imagery creation system configures capture devices,such as setting Lidar devices to a particular settings, zoom levels,etc.

In step 506, the imagery creation system processes the parameters andsettings to try and normalize reference inputs. For example, where a lowresolution camera is used as a witness camera to the side of a scene(e.g., witness camera 132), normalization might be to interpolate outputof the witness camera in order to match a higher resolution of a maincamera.

At step 508, the imagery creation system checks whether the referenceinputs are normalizable. If not, the imagery creation system returns tostep 502. As an example, the imagery creation system might determinethat given the placement of various capture devices, it would not bepossible to normalize outputs of those capture devices. An extreme casemight be where witness cameras are placed such that some object entirelyblocks their view. When returning to step 502, the imagery creationsystem might flag that as a problem to be corrected, or a human usersuch as a set layout manager might determine that the witness camerasshould be repositioned and would specify different reference stageparameters to reflect the new positions.

If the imagery creation system determines that the scene is normalizable(or partly normalizable, or normalizable to within predeterminedthresholds or ranges), the process continues at step 510 and the imagerycreation system stores the determined normalizing parameters for lateruse, such as in storage 420 illustrated in FIG. 4. The imagery creationsystem can then, at step 512, capture stage inputs with a plurality ofcapture devices. For example, the scene can be acted out, recorded witha main camera and other data captured from the capture devices.Optionally, the imagery creation system might configure capture devicesbased on the normalization parameters (step 514). The imagery creationsystem can then, at step 516, normalize captured inputs usingnormalization parameters, obtain reconstruction selections (step 518),and reconstruct a plate based on the selections, the normalized capturedinputs, and the reconstruction parameters (step 520). In someembodiments, previously recorded imagery or data can be used.

The various embodiments further can be implemented in a wide variety ofoperating environments, which in some cases can include one or more usercomputers, computing devices or processing devices that can be used tooperate any of a number of applications. Such a system also can includea number of workstations running any of a variety of commerciallyavailable operating systems and other known applications. These devicesalso can include virtual devices such as virtual machines, hypervisorsand other virtual devices capable of communicating via a network.

Note that, in the context of describing disclosed embodiments, unlessotherwise specified, use of expressions regarding executableinstructions (also referred to as code, applications, agents, etc.)performing operations that “instructions” do not ordinarily performunaided (e.g., transmission of data, calculations, etc.) denotes thatthe instructions are being executed by a machine, thereby causing themachine to perform the specified operations.

According to one embodiment, the techniques described herein areimplemented by one or more generalized computing systems programmed toperform the techniques pursuant to program instructions in firmware,memory, other storage, or a combination. Special-purpose computingdevices may be used, such as desktop computer systems, portable computersystems, handheld devices, networking devices or any other device thatincorporates hard-wired and/or program logic to implement thetechniques.

For example, FIG. 6 illustrates an example of visual content generationsystem 600 as might be used to generate imagery in the form of stillimages and/or video sequences of images. Visual content generationsystem 600 might generate imagery of live action scenes, computergenerated scenes, or a combination thereof. In a practical system, usersare provided with tools that allow them to specify, at high levels andlow levels where necessary, what is to go into that imagery. Forexample, a user might be an animation artist and might use the visualcontent generation system 600 to capture interaction between two humanactors performing live on a sound stage and replace one of the humanactors with a computer-generated anthropomorphic non-human being thatbehaves in ways that mimic the replaced human actor's movements andmannerisms, and then add in a third computer-generated character andbackground scene elements that are computer-generated, all in order totell a desired story or generate desired imagery.

Still images that are output by the visual content generation system 600might be represented in computer memory as pixel arrays, such as atwo-dimensional array of pixel color values, each associated with apixel having a position in a two-dimensional image array. Pixel colorvalues might be represented by three or more (or fewer) color values perpixel, such as a red value, a green value, and a blue value (e.g., inRGB format). Dimensions of such a two-dimensional array of pixel colorvalues might correspond to a preferred and/or standard display scheme,such as 1920-pixel columns by 1280-pixel rows. Images might or might notbe stored in a compressed format, but either way, a desired image may berepresented as a two-dimensional array of pixel color values. In anothervariation, images are represented by a pair of stereo images forthree-dimensional presentations and in other variations, some of theimage output, or all of it, might represent three-dimensional imageryinstead of just two-dimensional views.

A stored video sequence might include a plurality of images such as thestill images described above, but where each image of the plurality ofimages has a place in a timing sequence and the stored video sequence isarranged so that when each image is displayed in order, at a timeindicated by the timing sequence, the display presents what appears tobe moving and/or changing imagery. In one representation, each image ofthe plurality of images is a video frame having a specified frame numberthat corresponds to an amount of time that would elapse from when avideo sequence begins playing until that specified frame is displayed. Aframe rate might be used to describe how many frames of the stored videosequence are displayed per unit time. Example video sequences mightinclude 24 frames per second (24 FPS), 50 FPS, 140 FPS, or other framerates. In some embodiments, frames are interlaced or otherwise presentedfor display, but for clarity of description, in some examples, it isassumed that a video frame has one specified display time, but othervariations might be contemplated.

One method of creating a video sequence is to simply use a video camerato record a live action scene, i.e., events that physically occur andcan be recorded by a video camera. The events being recorded can beevents to be interpreted as viewed (such as seeing two human actors talkto each other) and/or can include events to be interpreted differentlydue to clever camera operations (such as moving actors about a stage tomake one appear larger than the other despite the actors actually beingof similar build, or using miniature objects with other miniatureobjects so as to be interpreted as a scene containing life-sizedobjects).

Creating video sequences for story-telling or other purposes often callsfor scenes that cannot be created with live actors, such as a talkingtree, an anthropomorphic object, space battles, and the like. Such videosequences might be generated computationally rather than capturing lightfrom live scenes. In some instances, an entirety of a video sequencemight be generated computationally, as in the case of acomputer-animated feature film. In some video sequences, it is desirableto have some computer-generated imagery and some live action, perhapswith some careful merging of the two.

While computer-generated imagery might be creatable by manuallyspecifying each color value for each pixel in each frame, this is likelytoo tedious to be practical. As a result, a creator uses various toolsto specify the imagery at a higher level. As an example, an artist mightspecify the positions in a scene space, such as a three-dimensionalcoordinate system, of objects and/or lighting, as well as a cameraviewpoint, and a camera view plane. From that, a rendering engine couldtake all of those as inputs, and compute each of the pixel color valuesin each of the frames. In another example, an artist specifies positionand movement of an articulated object having some specified texturerather than specifying the color of each pixel representing thatarticulated object in each frame.

In a specific example, a rendering engine performs ray tracing wherein apixel color value is determined by computing which objects lie along aray traced in the scene space from the camera viewpoint through a pointor portion of the camera view plane that corresponds to that pixel. Forexample, a camera view plane might be represented as a rectangle havinga position in the scene space that is divided into a grid correspondingto the pixels of the ultimate image to be generated, and if a raydefined by the camera viewpoint in the scene space and a given pixel inthat grid first intersects a solid, opaque, blue object, that givenpixel is assigned the color blue. Of course, for moderncomputer-generated imagery, determining pixel colors—and therebygenerating imagery—can be more complicated, as there are lightingissues, reflections, interpolations, and other considerations.

As illustrated in FIG. 6, a live action capture system 602 captures alive scene that plays out on a stage 604. The live action capture system602 is described herein in greater detail, but might include computerprocessing capabilities, image processing capabilities, one or moreprocessors, program code storage for storing program instructionsexecutable by the one or more processors, as well as user input devicesand user output devices, not all of which are shown.

In a specific live action capture system, cameras 606(1) and 606(2)capture the scene, while in some systems, there might be other sensor(s)608 that capture information from the live scene (e.g., infraredcameras, infrared sensors, motion capture (“mo-cap”) detectors, etc.).On the stage 604, there might be human actors, animal actors, inanimateobjects, background objects, and possibly an object such as a greenscreen 610 that is designed to be captured in a live scene recording insuch a way that it is easily overlaid with computer-generated imagery.The stage 604 might also contain objects that serve as fiducials, suchas fiducials 612(1)-(3), that might be used post-capture to determinewhere an object was during capture. A live action scene might beilluminated by one or more lights, such as an overhead light 614.

During or following the capture of a live action scene, the live actioncapture system 602 might output live action footage to a live actionfootage storage 620. A live action processing system 622 might processlive action footage to generate data about that live action footage andstore that data into a live action metadata storage 624. The live actionprocessing system 622 might include computer processing capabilities,image processing capabilities, one or more processors, program codestorage for storing program instructions executable by the one or moreprocessors, as well as user input devices and user output devices, notall of which are shown. The live action processing system 622 mightprocess live action footage to determine boundaries of objects in aframe or multiple frames, determine locations of objects in a liveaction scene, where a camera was relative to some action, distancesbetween moving objects and fiducials, etc. Where elements have sensorsattached to them or are detected, the metadata might include location,color, and intensity of the overhead light 614, as that might be usefulin post-processing to match computer-generated lighting on objects thatare computer-generated and overlaid on the live action footage. The liveaction processing system 622 might operate autonomously, perhaps basedon predetermined program instructions, to generate and output the liveaction metadata upon receiving and inputting the live action footage.The live action footage can be camera-captured data as well as data fromother sensors.

An animation creation system 630 is another part of the visual contentgeneration system 600. The animation creation system 630 might includecomputer processing capabilities, image processing capabilities, one ormore processors, program code storage for storing program instructionsexecutable by the one or more processors, as well as user input devicesand user output devices, not all of which are shown. The animationcreation system 630 might be used by animation artists, managers, andothers to specify details, perhaps programmatically and/orinteractively, of imagery to be generated. From user input and data froma database or other data source, indicated as a data store 632, theanimation creation system 630 might generate and output datarepresenting objects (e.g., a horse, a human, a ball, a teapot, a cloud,a light source, a texture, etc.) to an object storage 634, generate andoutput data representing a scene into a scene description storage 636,and/or generate and output data representing animation sequences to ananimation sequence storage 638.

Scene data might indicate locations of objects and other visualelements, values of their parameters, lighting, camera location, cameraview plane, and other details that a rendering engine 650 might use torender CGI imagery. For example, scene data might include the locationsof several articulated characters, background objects, lighting, etc.specified in a two-dimensional space, three-dimensional space, or otherdimensional space (such as a 2.5-dimensional space, three-quarterdimensions, pseudo-3D spaces, etc.) along with locations of a cameraviewpoint and view place from which to render imagery. For example,scene data might indicate that there is to be a red, fuzzy, talking dogin the right half of a video and a stationary tree in the left half ofthe video, all illuminated by a bright point light source that is aboveand behind the camera viewpoint. In some cases, the camera viewpoint isnot explicit, but can be determined from a viewing frustum. In the caseof imagery that is to be rendered to a rectangular view, the frustumwould be a truncated pyramid. Other shapes for a rendered view arepossible and the camera view plane could be different for differentshapes.

The animation creation system 630 might be interactive, allowing a userto read in animation sequences, scene descriptions, object details, etc.and edit those, possibly returning them to storage to update or replaceexisting data. As an example, an operator might read in objects fromobject storage into a baking processor that would transform thoseobjects into simpler forms and return those to the object storage 634 asnew or different objects. For example, an operator might read in anobject that has dozens of specified parameters (movable joints, coloroptions, textures, etc.), select some values for those parameters andthen save a baked object that is a simplified object with now fixedvalues for those parameters.

Rather than requiring user specification of each detail of a scene, datafrom the data store 632 might be used to drive object presentation. Forexample, if an artist is creating an animation of a spaceship passingover the surface of the Earth, instead of manually drawing or specifyinga coastline, the artist might specify that the animation creation system630 is to read data from the data store 632 in a file containingcoordinates of Earth coastlines and generate background elements of ascene using that coastline data.

Animation sequence data might be in the form of time series of data forcontrol points of an object that has attributes that are controllable.For example, an object might be a humanoid character with limbs andjoints that are movable in manners similar to typical human movements.An artist can specify an animation sequence at a high level, such as“the left hand moves from location (X1, Y1, Z1) to (X2, Y2, Z2) overtime T1 to T2”, at a lower level (e.g., “move the elbow joint 2.5degrees per frame”) or even at a very high level (e.g., “character Ashould move, consistent with the laws of physics that are given for thisscene, from point P1 to point P2 along a specified path”).

Animation sequences in an animated scene might be specified by whathappens in a live action scene. An animation driver generator 644 mightread in live action metadata, such as data representing movements andpositions of body parts of a live actor during a live action scene. Theanimation driver generator 644 might generate corresponding animationparameters to be stored in the animation sequence storage 638 for use inanimating a CGI object. This can be useful where a live action scene ofa human actor is captured while wearing mo-cap fiducials (e.g.,high-contrast markers outside actor clothing, high-visibility paint onactor skin, face, etc.) and the movement of those fiducials isdetermined by the live action processing system 622. The animationdriver generator 644 might convert that movement data intospecifications of how joints of an articulated CGI character are to moveover time.

A rendering engine 650 can read in animation sequences, scenedescriptions, and object details, as well as rendering engine controlinputs, such as a resolution selection and a set of renderingparameters. Resolution selection might be useful for an operator tocontrol a trade-off between speed of rendering and clarity of detail, asspeed might be more important than clarity for a movie maker to testsome interaction or direction, while clarity might be more importantthan speed for a movie maker to generate data that will be used forfinal prints of feature films to be distributed. The rendering engine650 might include computer processing capabilities, image processingcapabilities, one or more processors, program code storage for storingprogram instructions executable by the one or more processors, as wellas user input devices and user output devices, not all of which areshown.

The visual content generation system 600 can also include a mergingsystem 660 that merges live footage with animated content. The livefootage might be obtained and input by reading from the live actionfootage storage 620 to obtain live action footage, by reading from thelive action metadata storage 624 to obtain details such as presumedsegmentation in captured images segmenting objects in a live actionscene from their background (perhaps aided by the fact that the greenscreen 610 was part of the live action scene), and by obtaining CGIimagery from the rendering engine 650.

A merging system 660 might also read data from rulesets formerging/combining storage 662. A very simple example of a rule in aruleset might be “obtain a full image including a two-dimensional pixelarray from live footage, obtain a full image including a two-dimensionalpixel array from the rendering engine 650, and output an image whereeach pixel is a corresponding pixel from the rendering engine 650 whenthe corresponding pixel in the live footage is a specific color ofgreen, otherwise output a pixel value from the corresponding pixel inthe live footage.”

The merging system 660 might include computer processing capabilities,image processing capabilities, one or more processors, program codestorage for storing program instructions executable by the one or moreprocessors, as well as user input devices and user output devices, notall of which are shown. The merging system 660 might operateautonomously, following programming instructions, or might have a userinterface or programmatic interface over which an operator can control amerging process. In some embodiments, an operator can specify parametervalues to use in a merging process and/or might specify specific tweaksto be made to an output of the merging system 660, such as modifyingboundaries of segmented objects, inserting blurs to smooth outimperfections, or adding other effects. Based on its inputs, the mergingsystem 660 can output an image to be stored in a static image storage670 and/or a sequence of images in the form of video to be stored in ananimated/combined video storage 672.

Thus, as described, the visual content generation system 600 can be usedto generate video that combines live action with computer-generatedanimation using various components and tools, some of which aredescribed in more detail herein. While the visual content generationsystem 600 might be useful for such combinations, with suitablesettings, it can be used for outputting entirely live action footage orentirely CGI sequences. The code may also be provided and/or carried bya transitory computer readable medium, e.g., a transmission medium suchas in the form of a signal transmitted over a network.

According to one embodiment, the techniques described herein areimplemented by one or generalized computing systems programmed toperform the techniques pursuant to program instructions in firmware,memory, other storage, or a combination. Special-purpose computingdevices may be used, such as desktop computer systems, portable computersystems, handheld devices, networking devices or any other device thatincorporates hard-wired and/or program logic to implement thetechniques.

For example, FIG. 7 is a block diagram that illustrates a computersystem 700 upon which the computer systems of the systems describedherein and/or the visual content generation system 600 (see FIG. 6) maybe implemented. The computer system 700 includes a bus 702 or othercommunication mechanism for communicating information, and a processor704 coupled with the bus 702 for processing information. The processor704 may be, for example, a general-purpose microprocessor.

The computer system 700 also includes a main memory 706, such as arandom-access memory (RAM) or other dynamic storage device, coupled tothe bus 702 for storing information and instructions to be executed bythe processor 704. The main memory 706 may also be used for storingtemporary variables or other intermediate information during executionof instructions to be executed by the processor 704. Such instructions,when stored in non-transitory storage media accessible to the processor704, render the computer system 700 into a special-purpose machine thatis customized to perform the operations specified in the instructions.

The computer system 700 further includes a read only memory (ROM) 708 orother static storage device coupled to the bus 702 for storing staticinformation and instructions for the processor 704. A storage device710, such as a magnetic disk or optical disk, is provided and coupled tothe bus 702 for storing information and instructions.

The computer system 700 may be coupled via the bus 702 to a display 712,such as a computer monitor, for displaying information to a computeruser. An input device 714, including alphanumeric and other keys, iscoupled to the bus 702 for communicating information and commandselections to the processor 704. Another type of user input device is acursor control 716, such as a mouse, a trackball, or cursor directionkeys for communicating direction information and command selections tothe processor 704 and for controlling cursor movement on the display712. This input device typically has two degrees of freedom in two axes,a first axis (e.g., x) and a second axis (e.g., y), that allows thedevice to specify positions in a plane.

The computer system 700 may implement the techniques described hereinusing customized hard-wired logic, one or more ASICs or FPGAs, firmwareand/or program logic which in combination with the computer systemcauses or programs the computer system 700 to be a special-purposemachine. According to one embodiment, the techniques herein areperformed by the computer system 700 in response to the processor 704executing one or more sequences of one or more instructions contained inthe main memory 706. Such instructions may be read into the main memory706 from another storage medium, such as the storage device 710.Execution of the sequences of instructions contained in the main memory706 causes the processor 704 to perform the process steps describedherein. In alternative embodiments, hard-wired circuitry may be used inplace of or in combination with software instructions.

The term “storage media” as used herein refers to any non-transitorymedia that store data and/or instructions that cause a machine tooperation in a specific fashion. Such storage media may includenon-volatile media and/or volatile media. Non-volatile media includes,for example, optical or magnetic disks, such as the storage device 710.Volatile media includes dynamic memory, such as the main memory 706.Common forms of storage media include, for example, a floppy disk, aflexible disk, hard disk, solid state drive, magnetic tape, or any othermagnetic data storage medium, a CD-ROM, any other optical data storagemedium, any physical medium with patterns of holes, a RAM, a PROM, anEPROM, a FLASH-EPROM, NVRAM, any other memory chip or cartridge.

Storage media is distinct from but may be used in conjunction withtransmission media. Transmission media participates in transferringinformation between storage media. For example, transmission mediaincludes coaxial cables, copper wire, and fiber optics, including thewires that include the bus 702. Transmission media can also take theform of acoustic or light waves, such as those generated duringradio-wave and infra-red data communications.

Various forms of media may be involved in carrying one or more sequencesof one or more instructions to the processor 704 for execution. Forexample, the instructions may initially be carried on a magnetic disk orsolid-state drive of a remote computer. The remote computer can load theinstructions into its dynamic memory and send the instructions over anetwork connection. A modem or network interface local to the computersystem 700 can receive the data. The bus 702 carries the data to themain memory 706, from which the processor 704 retrieves and executes theinstructions. The instructions received by the main memory 706 mayoptionally be stored on the storage device 710 either before or afterexecution by the processor 704.

The computer system 700 also includes a communication interface 718coupled to the bus 702. The communication interface 718 provides atwo-way data communication coupling to a network link 720 that isconnected to a local network 722. For example, the communicationinterface 718 may be a network card, a modem, a cable modem, or asatellite modem to provide a data communication connection to acorresponding type of telephone line or communications line. Wirelesslinks may also be implemented. In any such implementation, thecommunication interface 718 sends and receives electrical,electromagnetic, or optical signals that carry digital data streamsrepresenting various types of information.

The network link 720 typically provides data communication through oneor more networks to other data devices. For example, the network link720 may provide a connection through the local network 722 to a hostcomputer 724 or to data equipment operated by an Internet ServiceProvider (ISP) 726. The ISP 726 in turn provides data communicationservices through the world-wide packet data communication network nowcommonly referred to as the “Internet” 728. The local network 722 andInternet 728 both use electrical, electromagnetic, or optical signalsthat carry digital data streams. The signals through the variousnetworks and the signals on the network link 720 and through thecommunication interface 718, which carry the digital data to and fromthe computer system 700, are example forms of transmission media.

The computer system 700 can send messages and receive data, includingprogram code, through the network(s), the network link 720, andcommunication interface 718. In the Internet example, a server 730 mighttransmit a requested code for an application program through theInternet 728, ISP 726, local network 722, and communication interface718. The received code may be executed by the processor 704 as it isreceived, and/or stored in the storage device 710, or other non-volatilestorage for later execution.

Operations of processes described herein can be performed in anysuitable order unless otherwise indicated herein or otherwise clearlycontradicted by context. Processes described herein (or variationsand/or combinations thereof) may be performed under the control of oneor more computer systems configured with executable instructions and maybe implemented as code (e.g., executable instructions, one or morecomputer programs or one or more applications) executing collectively onone or more processors, by hardware or combinations thereof. The codemay be stored on a computer-readable storage medium, for example, in theform of a computer program comprising a plurality of instructionsexecutable by one or more processors. The computer-readable storagemedium may be non-transitory. The code may also be provided carried by atransitory computer readable medium e.g., a transmission medium such asin the form of a signal transmitted over a network.

Conjunctive language, such as phrases of the form “at least one of A, B,and C,” or “at least one of A, B and C,” unless specifically statedotherwise or otherwise clearly contradicted by context, is otherwiseunderstood with the context as used in general to present that an item,term, etc., may be either A or B or C, or any nonempty subset of the setof A and B and C. For instance, in the illustrative example of a sethaving three members, the conjunctive phrases “at least one of A, B, andC” and “at least one of A, B and C” refer to any of the following sets:{A}, {B}, {C}, {A, B}, {A, C}, {B, C}, {A, B, C}. Thus, such conjunctivelanguage is not generally intended to imply that certain embodimentsrequire at least one of A, at least one of B and at least one of C eachto be present.

The use of examples, or exemplary language (e.g., “such as”) providedherein, is intended merely to better illuminate embodiments of theinvention and does not pose a limitation on the scope of the inventionunless otherwise claimed. No language in the specification should beconstrued as indicating any non-claimed element as essential to thepractice of the invention.

In the foregoing specification, embodiments of the invention have beendescribed with reference to numerous specific details that may vary fromimplementation to implementation. The specification and drawings are,accordingly, to be regarded in an illustrative rather than a restrictivesense. The sole and exclusive indicator of the scope of the invention,and what is intended by the applicants to be the scope of the invention,is the literal and equivalent scope of the set of claims that issue fromthis application, in the specific form in which such claims issue,including any subsequent correction.

Further embodiments can be envisioned to one of ordinary skill in theart after reading this disclosure. In other embodiments, combinations orsub-combinations of the above-disclosed invention can be advantageouslymade. The example arrangements of components are shown for purposes ofillustration and combinations, additions, re-arrangements, and the likeare contemplated in alternative embodiments of the present invention.Thus, while the invention has been described with respect to exemplaryembodiments, one skilled in the art will recognize that numerousmodifications are possible.

For example, the processes described herein may be implemented usinghardware components, software components, and/or any combinationthereof. The specification and drawings are, accordingly, to be regardedin an illustrative rather than a restrictive sense. It will, however, beevident that various modifications and changes may be made thereuntowithout departing from the broader spirit and scope of the invention asset forth in the claims and that the invention is intended to cover allmodifications and equivalents within the scope of the following claims.

All references, including publications, patent applications, andpatents, cited herein are hereby incorporated by reference to the sameextent as if each reference were individually and specifically indicatedto be incorporated by reference and were set forth in its entiretyherein.

What is claimed is:
 1. A computer-implemented method of processingimages from a main imaging device using capture device inputs fromcapture devices, the method comprising: receiving reference stageparameters and normalization parameters associated with the capturedevice; storing the normalization parameters; receiving main input fromthe main imaging device; receiving the capture device inputs from thecapture devices; normalizing the capture device inputs usingnormalization parameters to form normalized captured inputs; aligningtwo or more normalized captured inputs; obtaining reconstructionselections; reconstructing a plate based on the selections, thenormalized aligned captured inputs, and the reconstruction selection toform a reconstructed plate; and generating modified video content usingthe main input and the reconstructed plate.
 2. The method of claim 1,wherein the main imaging device is a camera, the method furthercomprising replacing pixels in digitized video content captured with thecamera to modify a captured scene in the digitized video content,wherein pixel color values of replaced pixel are determined based, atleast in part, on the reconstruction selections and the capture deviceinputs.
 3. The method of claim 2, further comprising: determining whichobjects in the captured scene are part of a plate and which objects areto be removed from the captured scene; and using results of determiningfor plate reconstruction.
 4. The method of claim 1, wherein thereference stage parameters relate at least in part to what capturedevices are being used and differences between the capture devices andthe main imaging device.
 5. The method of claim 1, wherein the capturedevices comprise one or more of a first camera having a resolutiondifferent than the main imaging device and/or a second camera optimizedfor a spectrum different than that of the main imaging device.
 6. Themethod of claim 1, further comprising configuring the capture devicesbased on the normalization parameters.
 7. An image processing systemcomprising: a first storage for a main scene capture, captured from amain imaging device; a normalizer for normalizing inputs from capturedevices; a second storage for normalized capture data, output by thenormalizer; a preprocessor that processes reference inputs, referencestage parameters, and capture device settings and that generatesnormalizing parameters; a third storage for the normalizing parameters,coupled to an input of the normalizer; and a machine-learningreconstructor that generates reconstructed imagery based onreconstruction parameters and reconstruction input selection.
 8. Theimage processing system of claim 7, wherein the first storage, thesecond storage, and the third storage are portions of a shared computermemory storage system.
 9. A computer system for processing digitalvideo, the system comprising: at least one processor; and acomputer-readable medium storing instructions, which when executed bythe at least one processor, causes the system to carry out the method ofclaim
 1. 10. A non-transitory computer-readable storage medium storinginstructions, which when executed by the at least one processor of acomputer system, causes the computer system to carry out the method ofclaim
 1. 11. A computer system comprising: one or more processors; and astorage medium storing instructions, which when executed by the at leastone processor, cause the system to implement the method of claim
 1. 12.A carrier medium carrying reconstructed plate image informationgenerated according to the method of claim 1.